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Loglinear smoothing (LLS) estimates the latent trait distribution while making 
fewer assumptions about its form and maintaining parsimony, thus leading to 
more precise item response theory (IRT) item parameter estimates than stan¬ 
dard marginal maximum likelihood (MML). This article provides the expectation- 
maximization algorithm for MML estimation with LLS embedded and compares 
LLS to other latent trait distribution specifications, a fixed normal distribution, 
and the empirical histogram solution, in terms of IRT item parameter recovety. 
Simulation study results using a 3-parameter logistic model reveal that LLS 
models matching four or five moments are optimal in most cases. Examples with 
empirical data compare LLS to these approaches as well as Ramsav-curve IRT. 
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Unidimensional item response theory (IRT; Lord & Novick, 1968) models 
play an important role in present-day educational measurement and testing. With 
marginal maximum likelihood (MML) estimation of IRT item parameters (Bock 
& Aitkin, 1981; Bock & Lieberman, 1970), the IRT model is expanded to include 
a latent trait distribution, denoted here by g(0). The latent trait is considered to be 
a nuisance parameter in MML and can be eliminated from the likelihood by inte¬ 
grating or marginalizing over it, a strategy that goes back to Neyman and Scott 
(1948). Item parameter recovery studies have shown that there is item parameter 
estimation error when the true g(0) is nonnormal and a normal distribution is 
specified (Boulet, 1996; Stone, 1992; Swaminathan & Gifford, 1983; Woods 
& Lin, 2009; Woods & Thissen, 2006; Yamamoto & Muraki, 1991). Many appli¬ 
cations of IRT use item parameter estimates from a calibration sample in subse¬ 
quent analyses, including trait estimation or equating. Given that there is a 
relationship between the amount of systematic and random error in item para¬ 
meter estimates and the amount of measurement error in estimates from these 
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subsequent analyses (Casabianca, 2011; Casabianca & Lewis, 2011a; Tsutakawa 
& Johnson, 1990; Zhang, 2011; Zhang, Xie, Song, & Lu, 2011), an interest of 
psychometricians is to obtain the most accurate and precise item parameter esti¬ 
mates possible by calibrating with large samples and the appropriate estimation 
specifications. 

Flexibility in model specification for g(Q) is important. Specifically, the 
model for g(0) should be able to accommodate nonnormality while also main¬ 
taining a level of parsimony. Current practice for specifying the latent trait dis¬ 
tribution assumes a normal distribution by default, and this is true in Parscale 4 
(Muraki & Bock, 2003), Bilog-MG 3 (Zimowski, Muraki, Mislevy, & Bock, 
2003), and more current software such as flexMIRT Version 2.0 (Cai, 2013). 
Although it is not always apparent, there are many situations in educational and 
psychological testing, where the population of test takers may have an approxi¬ 
mately normal observed test score distribution but a nonnormal distribution for 
the latent trait (Lord, 1953; Lord & Novick, 1968). An example in educational 
testing where the latent trait may be inherently nonnormal is a group of exami¬ 
nees with disabilities taking a K-12 alternate assessment. Measurement of con¬ 
structs in the field of clinical and personality psychology often involve 
nonnormal traits—for example, psychoticism, tends to be positively skewed 
(Eysenck & Eysenck, 1991; Matthews, Deary, & Whiteman, 2003, p. 22). 

Generally, parsimony is a major consideration in statistical modeling. It is 
especially important in psychometrics with the usage of relatively more complex 
models such as the unidimensional 3-parameter logistic (3PL) IRT model, multi¬ 
dimensional IRT models, and cognitive diagnostic models. These models involve 
many estimable parameters and are typically associated with identifiability 
issues (Haberman, 2005). Therefore, flexible yet parsimonious options for esti¬ 
mation should be considered in these cases, as the ideal model forg(0) sits at the 
optimal point of the bias-variance trade-off. 

The most common fixed specification for g(0) is a standard normal distribu¬ 
tion. A discrete approximation to the normal distribution is used for the purposes 
of numerical integration (Bock & Lieberman, 1970; Mislevy, 1984). The most 
common parametric approach is to estimate the parameters of the normal distri¬ 
bution (Andersen & Madsen, 1977; Mislevy, 1984; Rigdon & Tsutakawa, 1983; 
Sanathanan & Blumenthal, 1978). The most common nonparametric approach to 
model g(0), often called the Bock-Aitkin or the empirical histogram (EH) solu¬ 
tion (Bock & Aitkin, 1981; Mislevy & Bock, 1985; Rigdon & Tsutakawa, 1983), 
estimates Q — 1 distributional parameters that characterize a discrete g(0) using 
latent class probability estimation. 

The focus of this article is a semiparametric model for g(0) via loglinear 
smoothing (LLS; Holland & Thayer, 1987, 2000) in MML. LLS matches M 
moments of the original distribution to create a smoothed distribution. Suppose 
that an LLS model for g(Q) is embedded within an expectation-maximization 
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algorithm (EM; Dempster, Laird, & Rubin, 1977; Tsutakawa, 1985) for the 
MML estimation of item parameters. Given a discretized g(0) on Q support or 
quadrature points, an unsaturated LLS model (with fewer fitted moments than the 
Q — 1 points of the distribution or M < Q - 1) smooths the discrete g(0). Conver¬ 
sely, a saturated loglinear model (fitting Q — 1 moments or M = Q — 1) will 
exactly reproduce the original discrete g(0). 

Other smoothing methods, including LLS, have been used for discrete latent 
trait and mixture 1RT models by Rost and von Davier (1992, 1995), von Davier 
(2005), and Xu and von Davier (2008). Although these smoothing approaches 
have been around for quite some time, they were implemented in the context 
of complex multidimensional discrete latent trait modeling (see, e.g., mdltm soft¬ 
ware; von Davier, 2005). Furthermore, the algorithms used for estimation ofg(0) 
with LLS via the EM algorithm are currently undocumented in the literature and 
these methods have not been evaluated for use in the MML estimation of item 
parameters. We believe that for these reasons, LLS algorithms for g(0) are also 
unavailable in popular 1RT software packages and are not being utilized. Conse¬ 
quently, our goals for this article are to (i) provide the step-by-step algorithms for 
LLS in the EM framework, (ii) evaluate LLS as a method for estimating g(0) in 
MMLE of unidimensional 1RT models with different values for M and Q, and 
(iii) compare LLS to other popular approaches. 

In the next sections, we review MML estimation of item parameters and provide 
an overview of the various methods for characterizing g(0). We then introduce 
LLS as a general method for smoothing distributions and detail the specific algo¬ 
rithmic steps to implement the EM algorithm with LLS as the method for estimat¬ 
ing g(0) in IRT. We report results from an item parameter recovery simulation 
study performed in the context of the 3PL unidimensional dichotomous IRT 
model, focusing on item parameter recovery as a function of M and Q. We also 
present two data examples: one using data from the Programme for International 
Student Assessment (PISA) mathematics assessment and another from the Mauds- 
ley Obsessional Compulsive Inventory (MOCI). Finally, we discuss implications 
for item parameter estimation and suggestions for implementing LLS in practice. 


Characterization of the Latent Trait Distribution in MML Estimation 


MML estimation provides estimates of item parameters by assuming that the 
item response data are obtained from a random sample from a population of 
latent traits with a certain distribution. With MML, the likelihood is expanded 
to include g(0) and then integrated with respect to 0. The marginal probability 
of obtaining a specific response pattern \ s via integration over g(0) is 


PM 


+oo 

[ P(x s \Q)g(Q)dQ 


—oo 
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n P(xj S \Q\ <p) 
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where P(x, s |9;<p) is an IRT model specifying the conditional probability of 
obtaining a score x is for item i (z = 1, ... , k) from response pattern 5 
(5 = I. ... ,S), given 9 and tp, a vector of item parameters. The log marginal 
likelihood for a collection of S response patterns is given by: 


In L=y^ n s ln[P(x,j)] = y 1 


. In 


n p(x,. 


|0;«p) 


^(0)rfe 


( 2 ) 


s 

where n s is the number of test takers with response pattern .5 and n s = N. Note 

s~ 1 

that since each test taker has only one response pattern, the frequencies for each 
response pattern have a multinomial distribution with parameters, N and 
P — [Pi, ... ./ J s’], where P s = P(x s ) as defined by Equation 1. The MML item 
parameter estimates are obtained by maximizing In L in Equation 2. Numerical inte¬ 
gration approximates the integral in Equation 1, as it is not solvable in closed form. 

Bock and Lieberman (1979) introduced MML estimation using Gauss-Hermite 
quadrature to approximate integrals involving the normal distribution (see Stroud 
& Secrest, 1966). Quadrature utilizes values of a function at a finite set of discrete 
points to approximate the full area under the curve. A quadrature weight, W(T q ). 
refers to the height of the function (or in some cases the area of a rectangle) located 
at quadrature point, T q . There are several different forms of numerical quadrature, 
but the standard quadrature in the latent trait context uses fixed points with equal 
spacing and weights based on a rectangle approximation (i.e., the weight for each 
point represents the area of a rectangle located at the point). 

To approximate the integral in Equation 1, quadrature is used as given by. 


Q Q 

P(x s ) =yP(x.,\T q )W(T q ) = y 


9=1 


9=1 


El P(x is \T q : <p) 
i=\ 


W{T q ), 


(3) 


where T q is the quadrature point location and W(T q ) is the weight at quadrature 
point T q {q = 1, ... , Q ). In this equation, the product of the probability of obser¬ 
ving item response pattern x s at quadrature point T q and the weight at quadrature 
point T q are summed across all Q quadrature points. Inserting the log of Equation 
3 into Equation 2 gives the log-likelihood, 


In L = 


s s ( 

y, n s ln[P(xQ] s y n s ln< 

S=1 S=1 ( 


E 


9=1 


k 

TT P(xj S \T q -,<p) 
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(4) 


The EH Method for g(9) 

Since 0 is unobserved and therefore considered missing, Bock and Aitkin 
(1981) used the EM algorithm (Dempster, Laird, & Rubin, 1977) to implement 


550 



Casablanca and Lewis 


MML estimation of item parameters. In the E-step, expectations of sufficient sta¬ 
tistics N q (the expected number of examinees at T q ) and R iq ( the expected number 
of examinees at T q responding correctly to item i) are computed so that item para¬ 
meters may be estimated in the M-step. In the M-step, estimates of the item para¬ 
meters are computed that maximize the expected log likelihood from the E-step. 
These estimates are then used as updated quantities to compute the expectations 
in the next E-step, and so on, until convergence. 

The EH method for g-(0) introduced by Bock and Aitkin (1981) estimates 
weights W(T q ) iteratively, alongside item parameters, in the EM steps to gener¬ 
ate a discrete g(0). In the M-step, W(T q ) are estimated with 

P^(x s \T q -^)W^\T q ) 

Q , (5) 

£pW(^|T ? ;<p)lFW(T ? ) 

.< 7=1 

where P ^ (T ? |xs) is the posterior probability of T q given response pattern 5 and is 
based on item parameter estimates and estimated weights from the previous itera¬ 
tion j (Mislevy & Bock, 1985). 

These Q weights, denoted by A = W(T ) = [W(T i), W(T 2 ),..., W(Tq)], are 
estimated with the item parameters in the M-step and then used in the next itera¬ 
tion. Lewis (1985) provided a concise statement about the M-step in the EH char¬ 
acterization of g(0): 

p(V' +1 ^A (/+1) |cp (, ' ) ,A t/) ) = £-|log/(^X. e]tp^ +1 ), A. (/+n ) |x, <p0'\ A. 0 ' 0 1. (6) 

Equation 6 gives the quantity that is to be maximized with the EH method. It is 
apparent from Equation 6 how the item parameters and the distributional parameters 

are simultaneously estimated. The starting values, <pW, A* 1 , are used in the E-step to 
compute the expected counts, N q 1 ' and R [ iq ] needed to estimate item parameters. Then, 
in the M-step, cp (1) is updated to <p (2 ) by fitting the logistics using <p^ and A (1) . Then, 
A* 11 is updated to A <2) , and both cp (2) and A/"" 1 are used in the next E-step and so on. 
There are identifiability concerns when using the EH method forg(0) with the 3PL 
1RT model (Haberman, 2005), as the number of estimated parameters may be very 
large depending on test length and the number of quadrature points. 


w«+ 1] (T q ) = iX>pM(r ? |x s ) 


Other Approaches to Specify g (6) 

There are a variety of alternative methods to specify a nonnormal latent trait 
distribution, none of which have gained enough popularity to replace the EH 
approach as the most widely used. Xu and Jia (2011) used a parametric approach 
by estimating parameters of the skew-normal distribution in MML (mean, var¬ 
iance, and skewness). Thissen’s Johnson curve approach estimates Johnson curve 
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parameters (Johnson, 1949) along with item parameters in MML yielding curves 
with different combinations of skewness and kurtosis (available in MULTILOG 
7.0 software; Thissen, 2003; van den Oord, 2005). 

Another approach to estimating g(0) involves Davidian curves (DCs), which 
are characterized by the product of a squared polynomial of order k and the 
standard normal density function. To improve estimation and numerical stabi¬ 
lity, Zhang and Davidian (2001) reparameterized the coefficients of this model 
using polar coordinates. Woods and Lin (2009) introduced Davidian curve IRT 
(DC-1RT) with this reparameterized version of DCs. Under this approach, a 
best fitting solution must be selected from a set of 10 possible DCs, each with 
a different u value (with u = 1 the normal model and higher values of u accom¬ 
modating various shapes including skewness and multiple modes). 

Ramsay-curve IRT (RC-IRT; Woods & Thissen, 2006) implements MML 
estimation by using a splines-based approach to estimate a smooth g(0). That 
is, in addition to item parameters, RC-IRT estimates parameters which charac¬ 
terize a RC for g(0) based on piecewise polynomial “B-spline” basis functions 
of a specific order and number of breaks (the points where the B-splines join 
together). The software used to implement RC-IRT provides the user 25 candi¬ 
date RCs from which to choose the “best” solution based on a comparison of a 
series of model fit indices from all possible models starting from the simplest 2- 
2 normal model (“2-2” for knot-order combination), which returns a normal 
distribution, to the 6-6 model, which is order 6 with 6 knots. In addition to 
selecting a model based on these two parameters, the user must use trial and 
error to select the standard deviation (SD) for the multivariate normal prior 
on the spline coefficients. For maximal flexibility in the shape of the distribu¬ 
tion, this is typically set to a high value such as 500 and then reduced if there 
exist estimation issues. Simulation study results show that RC-IRT produces 
item parameter estimates superior to using a fixed parametric normal distribu¬ 
tion, when the true latent trait distribution is nonnormal (Woods & Thissen, 
2006). With recent attention on RC-IRT in the literature, we include this 
approach in our empirical examples. In the next section, we first discuss LLS 
in terms of observed score distributions and then LLS for the latent trait distri¬ 
bution in the context of unidimensional IRT models. 


Loglinear Smoothing 

Holland and Thayer (1987, 2000) introduced LLS as a method for reducing 
irregularities in discrete observed test score distributions. LLS estimates prob¬ 
abilities for discrete distributions based on an unsaturated loglinear model, and 
therefore, only some properties of the observed frequency distribution are pre¬ 
served by fitting M moments. This section discusses LLS models for observed 
and latent distributions. 
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Model Features and Specification for Observed Distributions 

Let 9 be a discrete random variable that can take on only the Q values 
Zi, 72 , • ■ • , Tq, with probabilities p\,pi, ■ ■ ■ ,Po, respectively. LLS estimates 
the probabilities p q from the contingency table of observations of each value 
of 0, Ti, 72 , ... , Tq, using the polynomial loglinear model 

M 

log e (p f ) = Po + I>'» 7 7’ ( 7 ) 

m=\ 

where T™ is the mth power of T q , and the coefficients p / = [P 1; P 2 , ... , P M ] and 
intercept P 0 are to be estimated from the observed counts n = [n \, ... , hq\. Note 
that p 0 is a normalizing constant forcing the sum of the p q to be 1. The same 
model can be used for the frequencies directly. That is, if p q satisfies a 
loglinear model, then n q will satisfy the same model with p 0 replaced by 
P 0 + log N (Holland & Thayer, 2000). Therefore, note that N = ffn q - 

The main feature of LLS is that it matches sample moments of the observed 
distribution. The maximum likelihood estimates from the model in Equation 7, p, 
force the estimated probabilities to satisfy moment-matching conditions put forth 
by the model specification (M) and the observed distribution (Holland & Thayer, 
2000). The maximum likelihood estimates P,„ satisfy the property that 

= ( 8 ) 

i ■? 

that is, the sample moments of 0 match the theoretical moments under the fitted 
model. The degree of smoothness (or actually “roughness”) in LLS is deter¬ 
mined by the highest power M of T q in Equation 7. For A/ = 0 (i.e., not including 
T q in the model at all), LLS maximally smooths the p q 's, estimating them as a 
uniform distribution. For M sufficiently large, the loglinear model in Equation 7 
is saturated and LLS estimates the p q as the natural method of moments estima¬ 
tors n q /N. Indeed, for M = Q — 1, the polynomial on the right-hand side of 
Equation 7 will be an interpolating polynomial for the log p q 's. Once the para¬ 
meters are estimated, the estimated probabilities are computed to characterize the 
smooth fitted distribution. 

LLS for the Latent Trait Distribution 

The LLS model for latent distributions is the same as Equation (7); however 
the values of T\ , 77, ... , Tq now represent values for quadrature points (or latent 
trait levels) and the estimates of p q , quadrature weights, or W(T q ). Under this for¬ 
mulation, there are M parameters estimated for the latent distribution. Since the 
expected frequencies of test takers at each T q are unobserved, the EM algorithm 
includes the LLS procedure to estimate p. 
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Consider the number of parameters estimated forg(0), and then compare the 
M moments estimated with LLS to the Q — 1 quadrature weights estimated with 
the EH method. That LLS can result in a substantial reduction in the number of 
parameters estimated when used instead of the traditional EH method is a fact 
that cannot be understated. This reduction occurs when M< Q. In terms of model 
parsimony for any test form, LLS can be used with any value of O and still only 
M parameters are needed to characterize g(9). Estimating g(0) with LLS permits 
the user to estimate nonnormal latent trait distributions without assuming a func¬ 
tional form and without estimating many additional parameters. In addition, LLS 
allows the user flexibility to control under- and overfitting by manipulating M. In 
this sense, a M- moment LLS model for g(0) provides a “sweet spot” for the 
bias-variance trade-off, where the levels of bias and variance are optimized, 
resulting in the least error. 


Choosing M and Q 

The more complex the distribution represented by the p q 's, the more moments 
may be needed. For example, a model that fits two moments can exactly capture a 
discretized normal distribution. By increasing the number of fitted moments, 
additional properties of discrete distributions will be captured (e.g., when 
M = 3, skewness is recovered, etc.). In the saturated case, where M = Q — 1, the 
distribution is perfectly fit. For this scenario, the model is not a smoothing model. 
Univariate observed score distributions typically need four or more fitted 
moments to be adequately characterized (Holland & Thayer, 2000); however, 
there are some differences in the number of moments appropriate for fitting a 
latent trait distribution (compared to an observed distribution). Cressie and Hol¬ 
land (1983) showed that under a Rasch model with both points and weights of 
g(0) estimated, the distribution of the latent trait is determined only up to its first 
k moments. Specifically, when k, the number of items, is small, then the use of 
many moments is not supported for latent trait distributions, as the amount of 
information that can be used in determining probabilities from the distribution 
is limited. 

In terms of choosing Q, generally, the larger the value of Q , the finer the char¬ 
acterized distribution, or the more closely the estimated distribution will approx¬ 
imate a continuous distribution. It may be inferred that the more points specified 
for the discrete distribution, the better the approximation to g(0) and thus better 
MLEs for the item parameters. Therefore, it is desirable to have a large 0 in order 
to capture the complexities in the distribution, but the benefit may taper. The 
maximal number of discrete points needed when estimating both points and 
weights of g(0) has been discussed for the Rasch model—it was found that 
approximately Q = k/2 is sufficient (De Leeuw & Verhelst, 1986; Lindsay, 
Clogg, & Grego, 1991). Tzamourani and Knott (2002) found in tests with k 
ranging from 3 to 21 that the 2PL model needs fewer than Q = k/2. Importantly, 
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Tzamourani and Knott (2002) also showed in an empirical example that item 
parameter estimates resulting from varying levels of Q do not differ by much, and 
discrimination parameter estimates get smaller as Q decreased. They note that 
this effect on the discrimination parameter estimates is likely a function of coar¬ 
sening the scale upon which the slope can be appropriately estimated—it is more 
challenging to discriminate examinees across fewer points or when they are cate¬ 
gorized into fewer ability levels. 

Note that with LLS, we estimate only M parameters in addition to item para¬ 
meters, and therefore, the value of Q has no influence on parsimony. In other 
words, we may choose a relatively larger value of Q to characterize a fine version 
of g(0) and still only need to estimate M parameters. 


The EM Algorithm With LLS 

To implement LLS forg(O) within the MML estimation of item parameters, 
the loglinear model estimation must be embedded within an EM algorithm 
(Tsutakawa, 1985; von Davier, 2005; Xu & von Davier, 2008). The E-step is 
identical for the EH method and LLS, thus details on the E-step may be found 
elsewhere (Mislevy & Bock, 1985). Therefore, for brevity, we present only the 
M-step portion of the EM algorithm using LLS for the 3PL model, and the 
1PL and 2PL IRT models may also be fitted using variations of this algorithm. 
We provide the full EM algorithm with LLS in the Appendix. 


M(aximization) Step: Stage 1 

For each item separately, solve the maximum likelihood equations given in 
Equations 9, 10, and 11, using the expected counts N q and R iq computed in the 
E-step. Note that g, is the item intercept, a, is the slope, and c,- is the guessing 
parameter for item i. 


Q 


g,. = —a,b, : 0 = ^2(R iq - PiqN q )H iq , 

(7=1 

(9) 

Q 

a i ■■ 0 = y^XRjq - PiqNq)H lq T q , 

0=1 

(10) 

Q 

c , : 0 = ( 1 - C,) -1 53 (Riq - P iq Nq)/Piq • 

9=1 

(11) 


n_ a)p* (i — p* ) 

Here, H iq = — p ' (1 “' p ^ and P* q = v|/(a {T q + g,), the logistic function evalu¬ 
ated at a,T q + g ( . Note that Equations 9, 10, and 11 treat R iq and N q as known 
frequencies, but in fact, the posterior expected values are used instead. 
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M(aximization) Step: Stage 2 

Begin a Newton cycle either by using a set of starting values for p 11 for the 
first iteration v = 1 in a series of Newton cycles (as described by Holland and 
Thayer, 1987, pp. 14-15) or by using P <l) estimated from the previous cycle if 
v > 1. Note that there may be J iterations of the EM cycles, and within each 
EM cycle, there may be V iterations of Newton cycles. 

(a) Compute a vector of fitted frequencies defined by: 


fW' 


APexpf^lt'Zi") 

\m= 1 / 

~Q 7m V’ 

Eexp £0i V) d 

q =1 \m =1 / 


^exp(f^pWz-) 

\m= 1 / 

~Q T~M 7’ 

£ex P 

< 7=1 \m= 1 / 


^exp^gPir^ 

’1? 7m 7 ’ 

£ex P f>wz; 

q =1 \m =1 / 

( 12 ) 


where elements of Z (a Q x M matrix) are standardized T q values, taken to the 
powers 1 through M. 

(b) Compute an estimated variance-covariance matrix for this vector defined by 

1 = Df 1 — 1 f7) fC/j ; w here is a diagonal matrix with the elements of 
in the diagonal. 

(c) Using a singular value decomposition, solve for 8 1 ' * in (Z'2fZ)8 <l 1 = Z'(n — f), 

where Z has columns that are vectors of (standardized) T q values, taken to the 
powers 1 through M and n' = [Ni,N 2 , N 0 \ is the vector of expected frequen¬ 

cies computed in the E-step. 

(d) Compute a new vector of coefficients p ,1 + l) by adding the vector of changes to 
the vector of coefficients from the previous Newton cycle or the starting values if 
in the first Newton cycle: p (, + l1 = p ,l| + 8 (l K 

(e) With the estimates P* 1 +1 \ and the predefined (and fixed) quadrature point values, 
compute a vector f ^ 11 of new estimated counts at each quadrature point. 

(f) Use the maximum absolute difference between the elements of the original i‘ J: 
and to determine convergence. If this difference is less than 0.0001 x N 
(or some other preferred convergence criterion), then convergence for the esti¬ 
mated frequencies has been reached. If convergence has been reached for the LLS 
algorithm but not for item parameters in Stage 1, then continue the EM iterations 
by returning to the E-step, using f l,+11 in the next E-step for quadrature weights, 
such that W(T q ) = f q +r> /N. If convergence has not been reached within the 
Newton cycles, use p (l + 11 in Step (a) and repeat. 


The final output of the M-step is an updated value for the fitted frequencies, 
which appear as the final quadrature weights, W(T q ) = f q /N. 

The software LLSEM 1.0 ( LogLinear Smoothing Expectation Maximization) 
implements these algorithms (Casabianca & Lewis, 2011b). LLSEM is capable 
of item parameter estimation under the 1PL, 2PL, and 3PL dichotomous IRT 
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models using the normal distribution assumption, the EH method, and LLS. In 
our implementation of LLS, we keep the quadrature points equally spaced and 
fixed throughout the entire estimation procedure to identify the scale (Lewis, 
1985; see the Appendix for more information on this identification method). 
Furthermore, there is no standardizing during the iterations, unlike what occurs 
in many commercial software programs. Instead, we scale item parameter esti¬ 
mates after convergence to the scale of the estimated latent trait distribution. 
Thus, the mean and SD of the resulting distribution are not necessarily 0 and 
1, respectively, as would be the result in many commercial software programs. 
More information on LLSEM is available in the Appendix. 


Simulation Study Design 

To provide an evaluation of LLS’s utility for item parameter recovery, we 
conducted a Monte Carlo simulation study for 1RT item parameter estimation 
and compared LLS to the fixed normal model and the EH method. 1 We varied 
skewness and number of modes of the true g-(0), number of moments fitted, 
and number of quadrature points. Test length and sample size are popular fac¬ 
tors included in item parameter recovery studies; however, they were not 
included here because pilot study work on LLS for IRT has shown minimal 
differences with different sample sizes (n = 500, 1,000, and 2,000) and test 
lengths (k = 25 and 50; Casabianca, Xu, Jia, & Lewis, 2010). We chose to pres¬ 
ent results from a simulation study using the 3PL IRT model because that is the 
model with which there are the most identifiability issues and where parsimony 
is of the utmost importance. 1PL simulation study results can be found in Casa¬ 
bianca (2011) and the empirical examples found in the following sections use 
the 2PL model. 


Item Response Generation Conditions 

Fifty item responses were generated for 1,000 simulated examinees from each 
of the following true latent trait distributions, each with mean and variance equal 
to 0 and 1, respectively: (a) standard Normal density, A(0,1); (b) negatively 
skewed continuous distribution with a skewness of — 1.5, created using a mixture 
of two normal distributions (p, = —1.259, a\ = 1.791, p 2 = 0.315, cr 2 = 0.307, 
mixing probabilities m = 0.2, and 1 - m = 0.8); and (c) bimodal continuous dis¬ 
tribution, also created as a mixture of two normal distributions (pj = —0.705, 
= 0.254, p 2 = 1.058, ct 2 = 0.254, and m = 0.6). One hundred (100) replica¬ 
tions of 50,000 item responses (50 items for 1,000 examinees) were generated for 
each of these three g(0) conditions. Figure 1 plots the densities for the three popu¬ 
lation latent trait distributions. Based on sample estimates from the 100 replications 
of 1,000 0s, the skewness and kurtosis of the negatively skewed g(0) were —1.5 and 
3.2, respectively. For the bimodal g(0), estimates of these moments were 0.3 and 
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Normal g(0) 



Negatively Skewed g(0) 



Bimodal g(0) 



FIGURE 1. True population latent trait distributions. 

— 1.0. The negatively skewed g(0) was comparable in terms of the magnitude of 
skewness to other simulation studies (Casabianca, Xu, Jia, & Lewis, 2010; 
Woods, 2008; Woods & Lin, 2009; Woods & Thissen, 2006). The mixture 
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distribution parameters for the bimodal g(0) were chosen to be consistent with pre¬ 
vious literature that also examined item parameter recovery for a bimodal g(0) 
(Woods & Lin, 2009). 

Item responses were simulated using calibrated 3PL items from a 2008 
National Mathematics Assessment (Item parameters were provided by ETS; 
Copyright (C) 2011 ETS. www.ets.org). The distribution of the difficulty para¬ 
meters was approximately normal (symmetric) but centered around .2. The mean 
a parameter was 1.13 (SD = 0.25), mean h parameter was 0.21 (SD = 0.51), and 
mean c parameter was 0.16 (SD = 0.05). 


Calibration Conditions 

Each data set was calibrated in LLSEM with the following specifications 
for g(0): fixed normal distribution (Mislevy & Bock, 1985), EH method (Mis- 
levy & Bock, 1985), and LLS (Casabianca, Xu, Jia, & Lewis, 2010; Holland & 
Thayer, 1987, 2000; von Davier, 2005; Xu & von Davier, 2008). There were 
three levels of the number of quadrature points: Q = 11, 31, and 61. Eleven 
quadrature points were chosen to be close to the default in BILOG-MG, which 
is 10 (Zimowski et al., 2003). In addition, we chose a higher number not 
exceeding the test length (31) and a higher number exceeding the test length 
(61). The last level of Q was selected to support a comparison to other studies 
that used very large Q (e.g., O = 121; Woods & Lin, 2009; Woods & Thissen, 
2006). 

The levels of M were specified in a nesting structure according to O. For all 
levels of Q, LLS models matching M = 2, 3, 4, and 5 moments were fit. These 
levels were chosen for two reasons. First, what the first few moments can capture 
can be hypothesized and conceptualized, higher moments probably cannot. Sec¬ 
ond, based on work with observed test score distributions, it has been noted that 
at least four or five moments are typically needed to adequately characterize a uni¬ 
variate distribution (Holland & Thayer, 2000). Beyond the five-moment model, M 
varied with Q in such a way that there was not an excessive number of conditions, 
but to include enough levels that questions regarding the effect of moment match¬ 
ing on item parameter recovery could be addressed. We aimed to find the optimal 
point in the bias-variance trade-off. We fitted the 10-moment model for Q = 31 
and the 10- and 15-moment models for Q = 61. Hereinafter, we will refer to LLS 
models using their M value—for example, LLS5 is the five-moment LLS model. In 
total, we report results for 63 conditions or 21 conditions per type of true latent trait 
distribution (normal, negatively skewed, and bimodal). 


Outcome Measures 

Item parameters are the sole focus of this simulation study. We compared item 
parameter estimates to the true item parameters using bias and root mean square 
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error (RMSE) criteria. We computed the average RMSE using the square root of 
the average mean square error across the 50 items. We also used a multivariate 
measure of error to assess overall recovery of item parameters called the mxD as 
used by Woods (2008). Specifically, this measure is the absolute difference 
between the item characteristic curve (ICC) using estimated item parameters and 
the ICC using the true item parameters, computed for each item across the Q 
quadrature points. Within condition, item, and replication, the maximum of these 
absolute differences (over the Q quadrature points) was determined. The mean of 
the absolute maximum difference was taken across the 50 items, and the mean 
was also taken across replications. 


Simulation Study Results 

Normal g(9) 

There was very little error in the normal case to start with! The mxD values for 
the LLS models (0.044-0.046) were the same as the EH results (0.045-0.046; see 
Table 1 for these mxD values). Differences between LLS models (across values 
of M) were negligible, and there was no pattern of increasing or decreasing as a 
function of the number of moments. There was no difference between Q = 31 
and Q = 61, since virtually all model specifications yielded the same amount 
of error, on average. However, for 0= 11, the average RMSE with the fixed nor¬ 
mal distribution specification was 0.01 higher than all other conditions. This 
difference appeared when examining differences in average RMSEs for the indi¬ 
vidual item parameters. That is, compared to the Q = 31 and Q = 61 conditions, 
the Q = 11 condition yielded about 0.03 higher average RMSE for the a and 
b parameters when using a fixed normal distribution for g(0). 

Negatively Skewed g (8) 

For the negatively skewed g(0), mxD was always smallest (for all Q ) under 
LLS4 (0.046-0.047). The EH mxDs were about 0.003-0.007 greater than the 
optimal LLS4 model. 

Figure 2 provides profile plots per parameter estimate and for the true nega¬ 
tively skewed (left column of plots) and bimodal (right column of plots) latent 
trait distributions. These plots show the trajectory for error for the three different 
Q levels, as we increase in model complexity. Here, x is Q = 11,0 is 0 = 31, 
and A is Q = 61. We excluded the profile plots for the true normal g(0) because 
there were very few differences between models. Of the LLS models. Figure 2 
shows that the average RMSEs were largest for LLS2, decrease from LLS3 and 
LLS4, and then showed either a slight increasing trend or tapered off as M 
increased. The a parameter was recovered best with LLS4 with Q = 11. The low¬ 
est average RMSEs were found with LLS4 for Q = 11 and the LLS5 for Q = 31 
and 61. For the b parameters, LLS4 was the best model (for all values of O). 
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FIGURE 2. Average root mean square error plots for a, b, and c item parameter estimates 
for negatively skewed and bimodal g(9) conditions. The left column of plots is for 
the negatively skewed conditions and the right column for the bimodal conditions. 
LLSM = M-moment LLS model; EH = empirical histogram; Normal = fixed normal 
distribution assumption. 

Finally, there was no trend for the c parameter. Figure 2 shows no distinction 
between specifications, although there was a slight dip in error in LLS 10 and 
LLS15. 
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Negatively Skewed Conditions 
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FIGURE 3. Item bias plotted as a function of true item parameters for 50 items in simula¬ 
tion study under the normal, EH method, and four-moment loglinear smoothing (LLS) 
models when Q = 31. The top, middle, and bottom rows show the bias for the a, b, and 
c parameters, respectively. The left column ofplots is for the negatively skewed conditions 
and the right column for the bimodal conditions. LLS4 = four-moment LLS model; 
EH = empirical histogram; Normal = fixed normal distribution assumption. 
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There was no difference between 0=31 and 61. However, when Q = 11, 
the fixed normal distribution assumption yielded a mxD of 0.089, which is 
higher than mxD = 0.046 yielded by LLS4. This difference is reflected in 
Figure 2 in the a and b parameters. Under LLS3, the Q = 31/61 conditions 
yielded larger average RMSEs than Q = 11 for a parameters. For both the a and 
b parameters, the average RMSEs were slightly larger for the Q = 11 condition 
under LLS5. 

The left column of plots in Figure 3 depict item-level bias for the negatively 
skewed condition for the a, b, and c parameters (going down the column) under 
the fixed normal assumption, EH, and LLS4 with Q = 31. For brevity, only LLS4 
was shown here (in filled circles) because it has been shown throughout the 
results to be the top performer. LLS4 yielded consistent positive bias across the 
a parameter scale with fluctuations around 0. EH and especially the fixed normal 
model showed a downward bias as a increased. Bias in the b parameters was usu¬ 
ally smaller with LLS4 than the other models—this was especially true at the 
ends of the scale. In a couple of instances, the LLS4 model yielded approxi¬ 
mately 0 bias, while EH yielded about 0.1 (in both the positive and negative 
directions). All three models had similar levels and trends for bias for the c para¬ 
meter: positive bias for lower c parameters and negative bias for higher c para¬ 
meters. LLS4 had relatively smaller biases with higher c parameters, although 
the highest true c in this simulation was only .25. 


Bimodal g(9) 

For the bimodal condition, the mxD ranged from 0.045 to 0.078 when Q = 11 
and 0.050 to 0.072 when Q = 31/61. The absolute least error was obtained with 
Q = 11 under LLS4 (mxD = 0.045). All LLS models performed similar to EH 
except for LLS2 and LLS3, which yielded more error. Although differences 
between the models were minimal, the mxD and the average RMSEs for LLS4 
were always smaller (mxD difference = 0.003) than the EH results. 

All three types of item parameters also showed the smallest average RMSEs 
under LLS4 (see Figure 2). There was an increase in error between LLS2 and 
LLS3 and then a large decrease in error between LLS3 and LLS4. After 
M = 4, the mxD and the average RMSEs increased, but only slightly. 

Differences between levels of Q were not straightforward. The two higher lev¬ 
els of Q were identical and there were inconsistent differences between 0=11 
and Q = 31/61. For example, for all models but the LLS3 model, mxD was 
smaller when Q= 11. Conversely, for LLS3, mxD was larger when 0=11. This 
is also shown on the profile plots (Figure 2) for each of the item parameters. 

The right column of Figure 3 shows item-level bias plots for the bimodal con¬ 
dition. In the top plot, the bias for the a parameters increases with the true a value 
for the fixed normal distribution assumption and EH, but this trend was not true 
for LLS4. Instead, bias tended to be smaller under LLS4. For the lower b 
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FIGURE 4. Estimated latent trait distributions with Q = 61 from Programme for 
International Student Assessment mathematics data (Shanghai-China sample). LLS4 = 
four-moment LLS model; LLS5 = five-moment loglinear smoothing (LLS) model; 
EH = empirical histogram method; RC 3-2 = 3-2 Ramsay-curve item response theory 
solution; Normal = fixed normal distribution assumption. 


parameters (between —0.7 and 0), the LLS4 model yielded smaller (positive) bias 
than the other models. Toward the center of the b scale, the three models con¬ 
verged. Larger positive bias was found for the smaller c parameters (0.07 to 
0.18), and LLS4 yielded the smallest of these biases. As c increased, bias for all 
models converged around 0. 

Empirical Illustration Using the PISA Mathematics 
Assessment—Shanghai Sample 

The PISA is an international study conducted by the Organization for Eco¬ 
nomic Cooperation and Development that assesses 15-year-old students in 
mathematics, reading, and science. Shanghai-China outperformed all other 
countries in all three PISA subject assessments in 2012. We analyzed 11 
dichotomously scored mathematics items from a subset (n = 1,623) of the 
Shanghai-China sample (n = 5,177) to demonstrate the 2PL EM algorithm 
with LLS for g(Q) and compare it to other methods. 2 Four calibration meth¬ 
ods (fixed normal assumption, EH, LLS, and RC-IRT) were compared using 
LLSEM 1.0 and RCLOG v.2 (Woods, 2006b). We set the bounds of the dis¬ 
tribution to (—4, 4), used 61 quadrature points, and fitted LLS models with 
M = 4 and 5. 

In our RCLOG specifications, we set the SD for the multivariate normal prior 
on the spline coefficients to 500 for maximal flexibility in the shape of the 
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distribution. We fitted all possible RC-IRT models with order up to 6 and number 
of breaks up to 6 and selected the best fitting model based on the consideration of 
several criteria as described in Woods (2006a). The criteria are the log-likelihood 
LogL, the Akaike information criterion (AIC; Akaike, 1973), the Bayesian infor¬ 
mation criterion (BIC; Schwarz, 1978), the Hannan-Quinn criterion (HQ; Han¬ 
nan, 1987), and the Kolmogorov-Smimov test for normality (KS; Kolmogorov, 
1933; Smirnov, 1939). The models with the smallest values of the LogL, AIC, 
BIC, and HQ signal the best fit. Woods (2006a) notes that it is common for these 
various criteria to yield different models; however, the current analysis revealed 
consistency across three of these four indices. That is, the 3-2 model was selected 
by BIC, HQ, and AIC. The LogL selected the 6-6 model. Both of these models 
are significantly different from normal model by the KS test. We proceed with 
the comparison of models using results from 3-2 RC-1RT model solution. 

Figure 4 provides the latent trait distributions from the MML estimation of 
item parameters for the PISA items. 3 The thicker solid curve is the discretized 
normal distribution. The thinner curves are distributions from EH, LLS, and 
RC-IRT, all of which should capture varying degrees of nonnormality, if any 
nonnormality should exist. Indeed, the estimated latent distributions all exhibit 
moderate positive kurtosis and moderate negative skewness. (For reference, 
consider statistics for the 3-2 RC: M = 0.000, SD = 1.001, skewness = 
—.884, kurtosis = 4.004.) While they all share similar shapes, the RC 3-2 and 
EH distributions are more leptokurtic than the LLS distributions. These distribu¬ 
tions indicate a population of examinees that tend to perform relatively higher on 
the latent trait scale, which is expected from the Shanghai-China sample. 

The profile plots of the a and b item parameter estimates from the PISA anal¬ 
ysis in Figure 5 provide a visual depiction of the differences between estimates 
by method for g(0). The x-axes on these plots arrange items sorted ascending by 
the normal parameter estimate. When examining the top plot of Figure 5, we 
see that all methods with the exception of the normal model yield very similar 
a parameter estimates. Only with the most discriminating item (#5) was there a 
difference between RC-IRT and the other methods. In this case, RC-IRT 
yielded a marginally higher estimate (2.28 vs. ~2.16 for the other methods). 
Note that although we observed a slight difference in the distributions for 
EH/RC-IRT and LLS solutions, the item parameters here are basically equiva¬ 
lent. There were inconsistent differences between the normal model estimates 
(black line and diamond symbol) and the other methods, in that for some items, 
the normal estimate was higher and for other items lower. This may be related 
to the corresponding difficulty estimates for these items (see the bottom plot of 
Figure 5). For easy items, the normal model yielded larger b estimates com¬ 
pared to the other methods. 

Although the LLS4 and EH models were very similar in parameter recovery as 
per the simulation results, it was technically the LLS4 model that yielded the 
most accurate ICC, when g(0) was nonnormal (shown via the MxD statistic in 
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FIGURE 5. hem response theory (IRT) parameter estimates plotted hi profiles by method 
for g(0) for 11 mathematics items from the 2012 Programme for International Student 
Assessment, Shanghai-China sample. The a and b estimates appear in the top and bottom 
plots, respectively. Items are sorted in ascending order by the normal parameter estimate. 
LLS4 = four-moment loglinear smoothing (LLS) model; LLS5 = five-moment LLS model; 
EH = empirical histogram method; RC 3-2 = 3-2 Ramsay-curve IRT solution; Normal = 
fixed normal distribution assumption. 
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FIGURE 6. Estimated latent trait distributions with Q = 61 from Maudsley Obsessional 
Compulsive Inventory wash item data. LLS4 = four-moment LLS model; LLS5 = five- 
moment LLS model; EH = empirical histogram method; RC 3-2 = 3-2 Ramsay-curve item 
response theory (IRT) solution; RC 2-4 = 2-4 Ramsay-curve IRTsolution; Normal = fixed 
normal distribution assumption. 


Table 1). The EH MxDs were always larger but by a very small order. It is rea¬ 
sonable to cautiously consider the LLS4 estimates here the best of the normal, 
EH, and LLS models. However, we must remind you that the distributions in 
the simulations and in this example differ, and the IRT models differ, making 
a direct correspondence between this example and the simulations less than 
straightforward. 


Empirical Illustration Using the MOCI 

We used data from an administration of the MOCI (Hodgson & Rachman, 
1977) to a sample of undergraduates enrolled in an introductory psychology class 
at the University of North Carolina (n = 1,080) to provide an additional demon¬ 
stration of LLS. The MOCI is a multidimensional true/false measure of obses¬ 
sive-compulsive symptoms. In this illustration, we use 9 items from the MOCI 
which together are unidimensional and comprise the “wash” subscale (Woods, 
2002). The subscale includes items such as “My hands do not feel dirty after 
touching money.” True responses to these items indicate a lack of obsession/ 
compulsion with washing, and therefore, respondents with negative latent trait 
values have higher levels of obsession and compulsion and the opposite is true 
as 0 increases. We calibrated the items with the 2PL IRT model under the normal, 
EH, LLS4, LLS5, and RC-IRT with Q = 61 and the bounds of the distribution set 
to (—4, 4). We fitted all possible RC-IRT models yielding 25 candidate RCs 
using a SD equal to 500 for the multivariate normal prior on the spline 
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FIGURE 7. Item response theory (IRT) parameter estimates plotted in profiles by method 
for g(d) for 9 wash items from Maudsley Obsessional Compulsive Inventory. The a and b 
estimates appear in the top and bottom plots, respectively. Items are sorted in ascending 
order by normal parameter estimate. LLS4 = four-moment LLS model; LLS5 = five- 
moment LLS model; EH — empirical histogram method; RC 3-2 = 3-2 Ramsay-curve IRT 
solution; RC 2-4 = 2-4 Ramsay-curve IRT solution; Normal = fixed normal distribution 
assumption. 
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coefficients. The best fitting model according to the BIC and HQ was the 3-2 
model. The A1C selected the 2-4 model and LogL selected the 4-3 model. We 
continued with the 3-2 and 2-4 models in our analysis because the A1C, HQ, and 
BIC are preferred, as they are penalized likelihood criteria. 

Figure 6 shows the series of distributions all with some degree of positive 
skew and kurtosis (3-2 RC: M = 0.000, SD = 1.002, skewness = 0.729, kurtosis 
= 3.801). The deviations from normality are clear when comparing these to the 
normal distribution. Interestingly, the EH distribution exhibits a very small sec¬ 
ondary mode on the right side—the other methods did not reveal this. It is true 
that at some point, LLS would also show this same mode, but it is unknown with 
what value of M. Aside from that difference found in the EH distribution, all 
other estimated distributions share the same shape. 

Figure 7 reveals only small differences between item parameters yielded by 
the models estimating g(0). The RC 3-2 and EH a estimates were almost always 
larger than as from LLS and RC 2-4. The normal a estimates were always larger, 
and substantially. This difference between the normal model and the other meth¬ 
ods increased as the a value increased. The b parameter estimates based on the 
normal model ranged from 0.75 to 2.3. There were basically no differences 
between b estimates for the easier items; however, the normal estimates were 
always smallest and RC 3-2 bs were also always just slightly smaller than LLS 
and EH bs. As items become more difficult, the difference between the normal 
bs and the other bs increased. Slight differences between the methods estimating 
g(0) also appear as b increases. 

Differences between the normal model item parameter estimates and esti¬ 
mates from the other methods for g(0) are consistent with the item parameter 
recovery results found in the simulations. We know that the normal model does 
not perform well when g(0) is nonnormal. Therefore, the differences observed 
here point toward using a nonparametric or semiparametric approach for g(0). 


Discussion 

Our goals were to provide the LLS/EM algorithm for IRT item parameter esti¬ 
mation, evaluate its effectiveness when g(0) is nonnormal and with different val¬ 
ues for M and Q, and compare it to other methods. This research investigated IRT 
item parameter recovery using various methods for specifying g(0) when the true 
g(0) was normal, negatively skewed, and bimodal. For the negatively skewed 
and bimodal g(0) conditions, LLS models matching four or more moments 
slightly outperformed or matched the performance of the EH method. In most 
cases, the differences in these average errors were small; however, we observed 
substantial differences in bias at the item level. For example, Figure 3 shows that 
when g(0) was negatively skewed, an item with true b = 1.1 had 0 bias under 
LLS4 but bias of 0.1 under EH. Similar differences are observable in Figure 3 
for both nonnormal g(0) conditions. Differences of this order are substantial, 
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especially considering the subsequent uses of parameter estimates (e.g., test score 
equating) and the need for the values entered into these analyses to be precise. 

It was generally true that somewhere along the M scale, the amount of error 
from item parameter estimation came to a stable minimum. In other words, 
there was a point at which additional moments did not contribute to the char¬ 
acterization of g(0) and thus did not contribute to the estimation of item para¬ 
meters. Furthermore, in some instances, additional moments actually led to 
poorer quality item parameter estimates, most likely due to overfitting. These 
asymptotic points occurred at different levels of M for each true g(0). For the 
normal g'(O), there was a very small dip in error at the two-moment mark, and 
from the three-moment model on, error was consistently larger (though not by 
much). Flowever, for both nonnormal distributions, this minimum certainly did 
not occur at two moments! For the particular negatively skewed g(0) examined 
in this study (skewness = —1.5), the four-moment model yielded the least 
error. Similarly, for the bimodal g-(0) modeled in this study, the two- and 
three-moment models yielded more error, and then error dipped to a mini¬ 
mum at four moments. After four moments, increasing the number of 
moments produced a gradually increasing trend in error. The dip in error rep¬ 
resents the optimal point, where bias and variance together lead to the least 
error. These optimal points are inherently specific to the distributions mod¬ 
eled in this study. 

While there was virtually no difference in item parameter recovery for Q = 31 
and Q = 61, there were some effects to the discrimination parameter between 
Q = 11 and the higher O conditions. Using more points to represent the distribu¬ 
tion (keeping the range of the distribution fixed) should impact the estimation of 
the slope. That is, more points characterize a finer g(0), and keeping all else con¬ 
stant, estimates of the slope should increase as Q increases ( see Casabianca, 2011 
or Tzamourani & Knott, 2002). This was in fact true with the normal g(0) con¬ 
dition. There was negative bias when Q = 11, which disappeared when Q = 31/ 
61. Finally, the fixed normal distribution assumption yielded differences in error 
between levels of Q. Specifically, there was a difference between the higher lev¬ 
els, (9 = 31/61 and (2=11. With the fixed normal distribution assumption, error 
was larger in Q = 11 for the negatively skewed condition and error was smaller in 
Q = 11 for the bimodal condition. Inconsistencies in the results are likely attri¬ 
butable to the conflation between the true g'(O), Q, and M and their interactions. 
Additional factors including the range of the distribution and the spacing between 
quadrature points would also affect recovery of the latent trait distribution and 
item parameters. 

The data examples provide a comparison of item parameters and latent distri¬ 
butions generated by a series of methods including RC-IRT for two data sets 
which both involved distributions with moderate skewness and kurtosis. The 
simulation and real data results were consistent: with the exception of the normal 
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model, differences between models forg(0) were mostly minor. We did observe 
many similarities between LLS and EH in both data sets, but we know from 
simulations that smoothing results in more accurate item parameter estimates. 


LLS Versus Other Methods for g (6) 

It is obvious why any nonparametric or semiparametric approach forg(0) may 
be preferred over the fixed normal model—these models will properly model 
deviations from normality in the characterization of g(0) thereby yielding more 
precise item parameter estimates. The problematic part about the flexibility of 
the most popular nonparametric approach, the EH approach, is that it involves 
an estimation procedure requiring possibly many additional parameters. For long 
tests and for complex models, where many item parameters are being estimated, 
this may lead to estimation and identifiability issues (Haberman, 2005). LLS 
offers a unique advantage over the EH method by estimating only the M moments 
necessary to capture nonnormality, making the LLS solution more parsimonious. 
We showed in our simulations that in addition to improvements in parsimony, 
LLS actually yields better item parameter estimates in terms of RMSEs and 
MxDs. However, we must remind the reader that this result is strictly applicable 
to the data under study. 

The payoff for using LLS is potentially great. For example, suppose we have a 
50-item test to be calibrated with the 3PL with Q = 61. Under the normal model, 
only 150 parameters are estimated. However, under EH, we estimate 150 + 60 = 210 
parameters. With a five-moment LLS model, we can obtain the same or bet¬ 
ter item parameter estimates with 150 + 5 = 155 parameters estimated. The 
number of additional parameters involved with RC-IRT is equal to the sum 
of the order and number of breaks minus two. Therefore, for the 3-2 model 
that was selected for our two empirical examples, we estimated an additional 
3 + 2 - 2 = 3 spline coefficients. For the most complex RC-IRT model, we 
would estimate 6 + 6 - 2 = 10 spline coefficients. 

RC-IRT is a very flexible option. The RCs estimated and selected for analysis 
in our empirical examples appear almost identical to the curves from the EH and 
LLS methods and the resulting item parameter estimates were very similar to 
these methods as well. The major difference between RC-IRT and LLS is in the 
implementation. With RC-IRT, the analyst considers a variety of candidate RC 
curves based on spline functions with different combinations of order and num¬ 
ber of breaks and selects a model based on a series of model fit indices. There is 
also a decision to be made in regard to the SD of the multivariate prior distribu¬ 
tion. Consequently, there are many possible solutions depending on the final SD 
chosen as well as the model selected based on the multiple fit indices, which 
often select different models. In our opinion, the wide range of options makes 
RC-IRT less straightforward than LLS. 
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Further, because it is a splines-based approach, RC-1RT has the potential for 
indeterminacy. The estimation procedure depends on the number and location of 
breaks in the RC used to characterize g(0). There is an upper limit on the number 
of breaks available in RCLOG (2-6 breaks). This dependency may result in estima¬ 
tion issues if there are insufficient breaks or if the breaks appear in inopportune loca¬ 
tions. For example, Woods and Thissen (2006) reported issues with convergence 
because the estimated curve required more quadrature range than was available. 

LLS provides smoothed solutions based on a moment-matching procedure 
that relaxes the potentially overfitted g(9) generated by EFI. It does this all while 
yielding the same or better item parameters with less of an expense in terms of 
the number of estimated parameters. The EH is a special case of the LLS frame¬ 
work (when M= Q — 1). The mathematics/statistics underlying LLS is accessible 
to researchers and analysts and the approach to implementing LLS is straightfor¬ 
ward with minimal consideration of anything but moments. Considering that as 
little as four moments can recover adequately a bimodal distribution, using LLS 
is very simple with minimal decisions to be made that would otherwise generate 
inconsistencies in model selection. LLS would not present many indeterminacy 
issues like RC-IRT because it can easily be implemented with very large Q val¬ 
ues to characterize g(9). 


Suggestions for Use and Final Remarks 

One model that kept reappearing as the best in terms of yielding smaller 
average RMSEs is the four-moment LLS model. Clearly, the two- and three- 
moment models are generally insufficient. We suggest the four- or even the 
five-moment models for the future use of LLS to estimate g(0) in the 3PL 
unidimensional IRT context. A more exhaustive approach to model selection 
would be to first calibrate the items with EH and observe the recovered latent 
trait distribution. The item responses could then be recalibrated with LLS 
using an M selected based on the estimated distribution in the first 
calibration. 

There are some caveats that should be noted in the interpretation of the simu¬ 
lation results. First, there is limited generalizability because we explored only 
two distinct nonnormal latent trait distributions. Further, the different item para¬ 
meters in the 3PL IRT model will have different levels of robustness to poor 
characterizations of the latent trait distribution. For example, if the scale is not 
wide enough (range restriction) or if there is improper representation of the dis¬ 
tribution, then there may be problems estimating the a parameter. Also, it is 
known that there are issues estimating c parameters, and the estimation of the 
a and c parameters are interdependent as the c defines the lower asymptote for 
the ICC. Future research will implement LLS in other contexts such as multiple 
group g(9) estimation, where the characterization of the distributions is the pri¬ 
mary outcome. The benefits of using LLS may be more apparent when using LLS 
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in more complex models, such as multidimensional IRT models, where addi¬ 
tional research with nonnormal latent variable distributions is needed (Cai, 
2010 ). 


Acknowledgment 

The authors are extremely grateful to Brian Junker and three anonymous reviewers for 
their careful reading and useful feedback on this research and article. We are also grateful 
to Carol Woods for providing the MOCI data and for assisting us with the RCLOG 
software. 


Authors’ Note 

A version of this article was presented at the 2012 National Council on Measurement in 
Education annual meeting. 

Declaration of Conflicting Interests 

The author) s) declared no potential conflicts of interest with respect to the research, 
authorship, and/or publication of this article. 

Funding 

The author(s) disclosed receipt of the following financial support for the research, author¬ 
ship, and/or publication of this article. This research is based on the first author’s disserta¬ 
tion carried out at Fordham University and was jointly supported by Fordham University’s 
Distinguished Dissertation Research Fellowship (2009) and by Educational Testing Ser¬ 
vice’s (ETS) Harold Gulliksen Psychometric Research Fellowship (2009). The research 
reported here was also supported in part by the Institute of Education Sciences, U.S. 
Department of Education, through Grant R305B1000012 to Carnegie Mellon University. 
The opinions expressed are those of the authors and do not represent views of the Institute 
or the U.S. Department of Education, Fordham University, or ETS. 

Notes 

1. Unfortunately, we were unable to include Davidian curve-item response the¬ 
ory in our simulations and empirical examples because the software is not cur¬ 
rently functional. Our empirical data examples compare Ramsay-curve item 
response theory (RC-IRT) to loglinear smoothing; however, this method is 
excluded from the simulation study, as the software to implement RC-IRT 
cannot be batched processed and the source code is unavailable. 

2. We selected a subset of the Shanghai-China sample that was administered 
booklet numbers 3, 7, 9, and 10, yielding responses to 11 dichotomous items. 

3. To facilitate the comparison of distributions, we rescaled the quadrature 
points for the LLSEM -based distributions (LLS4, LLS5, empirical histogram 
[EH], and Normal) to ensure a mean of 0 and variance of 1.0. However, 
because each latent distribution has a different mean and variance at the end 
of the estimation procedure, the points do not perfectly align. This may be 
resolved by interpolation. 
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